NSF PAR Search | NSF Public Access Repository

https://doi.org/10.1007/s10664-025-10616-2

Perez-Rosero, Salomé; Dyer, Robert; Flint, Samuel W; McIntosh, Shane; Srisa-an, Witawas (May 2025, Empirical Software Engineering)

Many software engineering maintenance tasks require linking a commit that induced a bug with the commit that later fixed that bug. Several existing SZZ algorithms provide a way to identify the potential commit that induced a bug when given a fixing commit as input. Prior work introduced the notion of a "work item", a logical grouping of commits that could be a single unit of work. Our key insight in this work is to recognize that a bug-inducing commit and the fix(es) for that bug together represent a "work item." It is not currently understood how these work items, which are logical groups of revisions addressing a single issue or feature, could impact the performance of algorithms such as SZZ. In this paper, we propose a heuristic that, given an input commit, uses information about changed methods to identify related commits that form a work item with the input commit. We hypothesize that given such a work item identifying heuristic, we can identify bug-inducing commits more accurately than existing SZZ approaches. We then build a new variant of SZZ that we call Work Item Aware SZZ (WIA-SZZ), that leverages our work item detecting heuristic to first suggest bug-inducing commits. If our heuristic fails to find any candidates, we then fall back to baseline variants of SZZ. We conduct a manual evaluation to assess the accuracy of our heuristic to identify work items. Our evaluation reveals the heuristic is 64% accurate in finding work items, but most importantly it is able to find many bug-inducing commits. We then evaluate our approach on 818 repositories that have been previously used to study the performance of SZZ, comparing our work against six SZZ variants. That evaluation shows an improvement in F1 scores ranging from 2% to 9% overall. When considering only the subset of cases where work items were identified, the improvement increases from 3% to 14%.

Free, publicly-accessible full text available May 1, 2026

Statically typed languages offer numerous benefits to developers, such as improved code quality and reduced runtime errors, but they also require the overhead of manual type annotations. To mitigate this burden, language designers have started incorporating support for type inference, where the compiler infers the type of a variable based on its declaration/usage context. As a result, type annotations are optional in certain contexts, and developers are empowered to use type inference in these situations. However, the usage patterns of type annotations in languages that support type inference are unclear. These patterns can help provide evidence for further research in program comprehension, in language design, and for education. We conduct a large-scale empirical study using Boa, a tool for mining software repositories, to investigate when and where developers use type inference in 498,963 Kotlin projects. We choose Kotlin because it is the default language for Android development, one of the largest software marketplaces. Additionally, Kotlin has supported declaration-site optional type annotations from its initial release. Our findings reveal that type inference is frequently employed for local variables and variables initialized with method calls declared outside the file are more likely to use type inference. These results have significant implications for language designers, providing valuable insight into where to allow type inference and how to optimize type inference algorithms for maximum efficiency, ultimately improving the development experience for developers.

Search for: All records